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DYNAMICAL FUNCTIONAL PREDICTION AND 
CLASSIFICATION, WITH APPLICATION TO TRAFFIC FLOW 

PREDICTION^ 

By Jeng-Min Chiou 

Academia Sinica 

Motivated by the need for accurate traffic flow prediction in trans- 
portation management, we propose a functional data method to ana- 
lyze traffic flow patterns and predict future traffic flow. In this study 
we approach the problem by sampling traffic flow trajectories from 
a mixture of stochastic processes. The proposed functional mixture 
prediction approach combines functional prediction with probabilistic 
functional classification to take distinct traffic flow patterns into ac- 
count. The probabilistic classification procedure, which incorporates 
functional clustering and discrimination, hinges on subspace projec- 
tion. The proposed methods not only assist in predicting traffic fiow 
trajectories, but also identify distinct patterns in daily traffic flow 
of typical temporal trends and variabilities. The proposed methodol- 
ogy is widely applicable in analysis and prediction of longitudinally 
recorded functional data. 

1. Introduction. Traffic ffow is an important macroscopic traffic cliar- 
acteristic in transportation systems. The measurement and forecasting of 
traffic flow are crucial in the design, planning and operations of highway 
facilities [Zhang and Ye (2008)]. Traffic flow can be measured automatically 
using various types of vehicle detectors such as the commonly used dual loop 
detectors, which are installed in certain roads at regular intervals. Real-time 
traffic flow information in conjunction with historical traffic flow records 
makes it possible to predict traffic flow in the short term. The importance 
of traffic prediction for intelligent transportation systems has long been rec- 
ognized in many applications, including the development of traffic control 
strategies in advanced traffic management systems and real-time route guid- 
ance in advanced traveler information systems [Zheng, Lee and Shi (2006)]. 
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However, dynamic features of traffic flow, along with unstable traffic condi- 
tions and unpredictable environmental factors, contribute to the challenge 
of pursuing accuracy in predictions. 

Short-term traffic flow prediction has been intensively investigated for 
more than two decades and various types of methodologies have been devel- 
oped. These include time series models [e.g., Williams and Hoel (2003), 
Stathopoulos and Karlaftis (2003)], Kalman filtering methods [e.g., Xie, 
Zhang and Ye (2007), Okutani and Stephanides (1984)], local linear regres- 
sion models [Sun et al. (2003)], neural network based methods [e.g., Chen 
and Grant-Muller (2001), Zheng, Lee and Shi (2006), Qetiner, Sari and Borat 
(2010)] and fuzzy neural models and fuzzy logic system methods [Yin et al. 
(2002), Zhang and Ye (2008)], among others. In addition, there are many 
articles comparing parametric time series models, nonparametric regression 
models and neural networks in traffic prediction, such as in Kirby, Waston 
and Dougherty (1997), Smith and Demetsky (1997) and Smith, William and 
Oswald (2002). More recently, Kamarianakis, Shen and Wynter (2012) dis- 
cussed road traffic forecasting for highway networks using fully parametric 
regime-switching space-time models, coupled with a penalized estimation 
scheme. To our knowledge, a functional data approach to predicting traffic 
flow has not yet been investigated in the literature. 

1.1. Illustration of traffic flow prediction and the proposed functional data 
method. Motivated by a practical need for accurate traffic flow prediction, 
we develop a novel functional data method for predicting future, or unob- 
served, daily traffic flow for an up-to-date and partially observed traffic flow 
trajectory. Figure 1 illustrates a sample of daily traffic flow trajectories. The 
data were collected by a dual loop vehicle detector located near Shea-San 
Tunnel on National Highway 5 in Taiwan in 2009 and are based on the flow 
rates (vehicle count per min) over 15-min time intervals, a metric suggested 
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Fig. 1. Daily traffic flow trajectories (training data) with the estimated mean function 
superimposed on the observed trajectories. 
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(a) T = 8 : 00 (b) t = 12 : 00 (c) t = 16 : 00 
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(d) T = 8 : 00 (e) t = 12 : 00 (f) t = 16 : 00 

Fig. 2. Two samples of trajectories from the test data set /^(a)-(c) for Test sample 1; 
(d)-(f ) for Test sample 2J. The fitted curves ( dotted lines before times t ) and the predicted 
curves (solid curves after times r) with 95% prediction intervals for a partially observed 
trajectory available up to times i" = 8, 12 and 16:00, superimposed on the complete trajec- 
tory (gray line). 



in Highway Capacity Manual 2000 for operational analyses [Zheng, Lee and 
Shi (2006)]. The trajectories sample 70 days as the training data, while the 
remaining 14 days are used as the test data to validate the prediction per- 
formance. The aim is to predict the unobserved traffic flow trajectory for a 
partial trajectory with updated flow information up to the "current time" r, 
which is given as a time of day. In Figure 2, the raw trajectories (gray lines) 
before r = 8, 12 and 16:00 are observations from the test data, superim- 
posed on the curves (dotted lines) fitted by functional principal component 
analysis. After the last observation time point r, the predicted traffic flow 
trajectories (solid lines) are obtained by the proposed functional mixture 
prediction model coupled with the 95% bootstrap prediction intervals. The 
real data trajectories after times r (gray lines) are unobserved in the predic- 
tion scenario and are displayed for comparative purposes. The prediction for 
the trajectory is dynamically updated as the "current time" r progresses. 

We note that the aforementioned methods in the literature were largely 
developed based on "short-term" or "next-step" traffic prediction modeling, 
where "short-term" refers to a forecast horizon of a time interval. A 15-min 
interval is commonly used as the forecast horizon for traveler-oriented appli- 
cations and operational analysis [Zhang and Ye (2008)]. In contrast to the 
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"next-interval" prediction methods in the existing hterature, our functional 
data method is more flexible, as illustrated in Figure 2, allowing valid pre- 
diction periods extended from the "current time" to the end of the day and 
thus providing more information to relevant users. 

Since future traffic conditions and temporal traffic flow patterns play a 
critical role in traffic prediction [e.g., Smith, William and Oswald (2002), 
Vlahogianni, Karlaftis and Golias (2008)], to take into account distinct daily 
traffic flow patterns, we propose a functional mixture prediction approach 
that combines functional prediction with a probabilistic classification pro- 
cedure. Specifically, we propose to implement functional cluster analysis of 
past traffic flow trajectories to obtain typical daily traffic flow patterns or 
clusters, followed by a probabilistic classiflcation for the traffic fiow trajec- 
tory observed thus far. Based on the traffic fiow patterns or clusters identified 
by the proposed method, we can predict the unobserved traffic fiow trajec- 
tories by a functional prediction model in conjunction with the estimated 
posterior membership probabilities of traffic flow clusters. Although moti- 
vated by traffic flow analysis and prediction, the proposed methodology is 
by no means restricted to this particular fleld and is generally applicable to 
a wide variety of longitudinally recorded functional data. 

In many real applications, clustering of curve data can be challenging and 
misclassiflcation of an up-to-date and partially observed trajectory can cause 
loss of prediction accuracy. Hence, a simple "prediction-after-classification" 
approach based on hard clustering results may not be the best approach. 
This will be illustrated in our numerical studies including the real data 
application and simulations. In contrast, the proposed functional mixture 
prediction framework addresses challenges related to prediction of complex 
functional data, such as those containing heterogeneous patterns and large 
variability over time. The proposed functional mixture prediction approach 
to traffic flow prediction has the following features: 

• It is the first approach to employ functional data techniques to address 
traffic fiow prediction applications, which are critical in many intelligent 
transportation systems. 

• The functional data approach allows for interval prediction in contrast to 
"next-step" traffic flow prediction applications found in the literature. 

• The proposed functional mixture prediction model plays a central role in 
predicting the future trajectory for an up-to-date and partially observed 
trajectory and takes distinct traffic fiow patterns into account to improve 
prediction accuracy. This study extends the idea of the subspace projected 
functional clustering method of Chiou and Li (2007) to identify distinct 
patterns of daily traffic fiow from the past data, coupled with the forward 
functional testing procedure of Li and Chiou (2011) to determine the 
number of clusters, which lays the groundwork for the functional mixture 
prediction approach. 
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• The probabilistic classification approach, including functional clustering 
and discrimination, is new. It allows for the prediction of posterior mem- 
bership probabilities using partial information up to the current time, in 
contrast to clusters constructed by complete trajectories from past data. 

• The predictive function in the functional mixture prediction model is built 
on an existing functional linear model, which is widely used in functional 
regression modeling [e.g., Ramsay and Dalzell (1991), Miiller, Chiou and 
Leng (2009)] and is easy to implement. 

1.2. Literature review of relevant functional data methods. Statistical 
tools for functional data analysis have been extensively developed during 
the past two decades to deal with data samples consisting of curves or other 
infinite-dimensional data objects. Systematic overviews of functional data 
analysis are provided in the monographs of Ramsay and Silverman (2002, 
2005) and Ferraty and Vieu (2006) and in the review articles of Rice (2004), 
Zhao, Marron and Wells (2004) and Miiller (2005, 2009). Functional data 
analysis provides a wide range of applications in many disciplines. These 
include biomedical and environmental studies [Di et al. (2009), Gao and 
Niemeier (2008)], analysis of time-course gene expression profiles [Miiller, 
Chiou and Leng (2008), Coffey and Hinde (2011)], linguistic pitch analysis 
[Aston, Chiou and Evans (2010)] and demographic and mortality forecast- 
ing [Hyndman and Shahid Ullah (2007), Chiou and Miiller (2009), D'Amato, 
Piscopo and Russolillo (2011)], among many others. In relation to functional 
data prediction, Miiller and Zhang (2005) proposed a functional data ap- 
proach to predicting remaining lifetime and age-at-death distributions from 
individual event histories observed up to the current time. More recently, 
Zhou, Serban and Gebraeel (2011) proposed a functional data approach to 
degradation modeling for the evolution of degradation signals and the re- 
maining life distribution. These are relevant works that contain novel func- 
tional data techniques with interesting applications to the prediction of an 
unobserved event for a partial trajectory observed up to the current time. 

Among the various settings in functional regression analysis [Miiller (2005)], 
models with both the response and predictor variables as functions serve 
this study's purpose with regard to prediction. Functional regression models 
of this kind have been considered, for example, in Yao, Miiller and Wang 
(2005b), Chiou and Miiller (2007), Miiller, Chiou and Leng (2008) and An- 
toch et al. (2010). Methods of functional data clustering that are found in 
the literature include the use of multivariate clustering algorithms on the 
finite-dimensional coefficients of basis function expansions [e.g., Abraham 
et al. (2003), Serban and Wasserman (2005)], model-based functional data 
clustering [e.g., James and Sugar (2003), Ma and Zhong (2008)], a general de- 
scending hierarchical algorithm [Chapter 9 of Ferraty and Vieu (2006)] and 
various depth-based classification methods [Cuevas, Febrero and Fraiman 
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(2007), Lopez-Pintado and Romo (2006)], among others. Of particular in- 
terest with regard to functional prediction models are the methods that 
define clusters via subspace projection [Chiou and Li (2007, 2008)]. The 
subspace projection method considers cluster differences not only in mean 
functions, but also in eigenfunctions of covariance kernels that takes into 
account individual random process variations, making it suitable for inter- 
preting the stochastic nature of traffic flow and suggesting a natural link 
with functional regression models. 

This article is organized as follows. In Section 2 we represent traffic flow 
trajectories as a mixture of stochastic processes and discuss functional clus- 
tering and classification methods to take traffic flow patterns into account. 
Section 3 discusses the functional mixture prediction model, including the 
algorithm for implementing functional mixture prediction. Sections 4 and 5 
illustrate the empirical analysis of traffic flow patterns and results of predict- 
ing traffic flow trajectories. Section 6 presents a simulation study to evaluate 
the performance of the functional mixture prediction in comparison with re- 
lated methods. Concluding remarks and discussion are provided in Section 7. 
More information in selecting the number of clusters, the bootstrap predic- 
tion intervals and additional details in the simulation design and results are 
deferred to Supplementary Materials [Chiou (2012)]. 

2. Modeling traffic flow trajectories and clustering traffic flow patterns. 

Previous studies in traffic flow prediction and modeling have revealed that 
traffic condition data is characteristically stochastic, as opposed to chaotic 
[Smith, William and Oswald (2002)]. The stochastic features of traffic flow 
trajectories are suggestive of a functional data approach. In the functional 
data framework, we adopt the notion that each daily traffic flow trajectory 
is a realization of a random function sampled from a mixture of stochas- 
tic processes. Let Z denote the random function for the daily traffic flow 
trajectory in the domain li = [0,T]. Here, the random function Z is square 
integrable with the inner product of any two functions fi and /2 defined as 

1 /2 

(/i,/2)w = JuMt)Mt)dt with the norm \\fi\\u = • It is assumed 

that the random function Z{t) has a smooth mean function EZ{t) = nz{i) 
and covariance function cov {Z (s), Z{t)) = Gz{s,t), for s and t in U. 

2.1. Functional clustering of historical traffic flow trajectories. While 
temporal traffic flow patterns are critical in traffic prediction, the under- 
lying traffic flow structures and number of typical patterns are unknown 
and remain to be explored. We assume the mixture process Z consists of 
K subkprocesses, with each subprocess corresponding to a cluster. The ran- 
dom cluster variable C for each individual cluster membership is randomly 
distributed among clusters with label cG {\, . . . ,K}. For each subprocess 
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associated with cluster c, define the conditional mean function E{Z{t) \ C = 
c) = fJ'^'^\t) and covariance function cov(Z(s), Z(t) \ C = c) = G^'^^{s,t), for 

c G {1, . . . , K}. Let {Xj^^ , ^f^) be the corresponding eigenvalue-eigenfunction 

pairs of the covariance kernel where A^'^'* are in nonascending order. As- 
sume, under mild conditions, each subprocess possesses a Karhunen-Loeve 
expansion for the daily traffic flow trajectory Z given by 

oo 

(2.1) Z(^\t) = f,(^Ht) + ^cf^f{t), 

where ^^^^ = {Z — ip^j^^ with {ip^^^ , (p^'^^ )w = 1 for j = I and otherwise. 
In practice, it is often the case that the representation only requires a small 
number of components to approximate the trajectories. In general, trajecto- 
ries with simpler structure require fewer components as compared to more 
complex trajectories. 

Following the conventional approach, the best cluster membership c* 
given Z is determined by maximizing the posterior probability Pc\zi' I ') 
such that 

(2.2) c*{Z) = argmax Pc|z(c | Z). 

c&{l,...,K} 

We propose estimating the posterior membership probability P{C = c \ Z) 
using the so-called discriminative approach, as opposed to the generative 
approach [see, e.g., Dawid (1976), Bishop and Lasserre (2007)]. While there 
is no general consensus for choosing between generative and discriminative 
approaches [Ng and Jordan (2002), Xue and Titterington (2008)], the for- 
mer requires a priori knowledge on the class-conditional probability density 
functions, information that is difficult to justify incorporating for the traffic 
flow trajectories. It is easier to use the discriminative approach that directly 
estimates the class-membership probabilities without attempting to model 
the underlying probability distributions of the random functions. Following 
this line, the multiclass logit model is a popular method for estimating the 
posterior membership probabilities. We propose incorporating a distance 
measure between Z and its projection associated with each cluster as the 
covariate in the multiclass logit model. 

Consider the relative distance as the distance measure based on cluster 
subspace projection as 

117 7(c) ||2 

(2.3) d('-)= ''^"^ J' , 

Ek=i\\Z-ZWW^' 

where zW(t) = ^W(t) + EfiuJ'Vf'^i), with ^j"^ = {Z - fi^'\ipi'^)u- The 
value Mc is finite and is chosen data-adaptively so that Z is well approx- 
imated by Z^^^ by the Mc components. Let d = . . . and 
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7c = ilocjic-, ■ ■ ■ ,7{K~i)c)^ ■ Taking the vector d as the covariate, we can 
estimate the posterior cluster membership probabiUty using the multiclass 
logit model, 



(2.4) P{C = c\Z) 



exp{7jd} 



Ef=i exp{7fc d} 

for c = 1, . . . , - 1 and P{C = K \ Z) = 1 - Y^^S[^ P{C = c\Z) with the 
Kth. cluster being the baseline. The vector of regression coefficients 7^ re- 
mains to be estimated. 

Clusters defined by criterion (2.2) are based on subspace projection. Let 

S^^j be the linear span of the set of eigenfunctions {ip^-^\ . . . , ip^^^ } , c = 



1,...,K. For identifiability, it is assumed that for any two clusters c and 
d the following two conditions do not hold simultaneously: (i) S^^j belongs 

to S^m\ (ii) ^t(^) = fi^'^l or ^t^'^) G sJJ^ and € 4f • These conditions were 
derived in Theorem 1 of Chiou and Li (2007) for identifiability of clusters 
defined via subspace projection. Criterion (2.2) leads to clusters with similar 
curves that are embedded in the cluster subspace spanned by the cluster cen- 
ter components, the mean function and the eigenfunctions of the covariance 
kernel that represent the functional principal component subspace. 

In addition, the number of clusters is unknown, and must be determined 
in practice. The method used to determine the number of clusters in this 
study is based on a sequence of tests on cluster structures done to ensure 
statistical significance in the difference between cluster types as proposed 
in Li and Chiou (2011). The number of clusters K is determined by test- 
ing a sequence of null hypotheses Hqi : fj.^'^'^ = fi^'^^ and i?02 : 'S'^'' = , for 
1 < d < K . The forward functional testing procedure aims to search for 
the maximum number of clusters while retaining differences with statisti- 
cal significance among the clusters. The procedure is especially suitable for 
the subspace projected functional clustering method. The sequence of the 
functional hypothesis tests helps identify significant differences between clus- 
ter structures and provides additional insight into further cluster analysis. 
Since the hypothesis tests are based on bootstrap resampling methods, it 
takes substantial computational time to construct the reference distribution. 
Details of the procedure are discussed in Li and Chiou (2011) and we have 
briefly summarized them in Supplementary Material A [Chiou (2012)]. 

2.2. Probabilistic functional classification of traffic flow patterns. For the 
purpose of prediction, the time domain U of the process Z is decomposed 
into two exclusive time domains S{t) = [0, r] and T(t) = [r, T]. Now, let 
Z* be a newly observed trajectory of the process Z, denoted by Z^^^-^ as 
observed up to time r. We predict the cluster membership probability of the 
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trajectory Z* based on the known trajectory observed until time r, 

which will then be used to predict the unobserved trajectory ^-f (^) • 

We define the relative l? distance in a manner similar to (2.3) via cluster 
subspace projection, but it is based on the partially observed ■^^(t-) rather 
than the entire Z* since the part is not yet observed. Suppose that 

the cluster subspaces /i^"^) and {V'^^f^) j}) c = 1, . . . , K, are being identified as 
in Section 2.1. Then, the relative I?' distance is defined as 

(2.5) d*^'^) - ^ 



S{r) 11^* yik) 



2 ■ 



E-n- II 7* _ y( 
fc=lll^5(r) ^5(r)l 

Where Z<^^{s) = /.(^)(.) + E^i 4r),^it)/^)' -^^^ = ^^kr) " ^^'^^ 

^sIt) Here, the set of eigenfunctions {^^(t-) corresponds to the co- 

variance kernel C^^^''^^ of the random process .^5(r) • Taking dj(^) = , . . . 

'^5(t) ^'^^ covariate, we can predict the cluster membership probabil- 

ity based on the newly observed using the multiclass logit model 

(2.6) P(C = c|Z-^,p= expl^^d^ } 

'^^^ Ef=iexp{7ld*(^)} 

for c = 1, . . . , iv: - 1, and P{C = K \ Z*^^^ = 1 - Y.c=i ^(C = c | Z5(^)) with 
the Kih. cluster being baseline. We note that the vector of coefficients 'y^ 
here is the same as that in (2.4) based on the historical or training data. 

2.3. Estimation for probabilistic functional classification. In practice, the 
observed trajectories may be contaminated with random measurement er- 
rors. Let Yi{tij) be the jth observation of the ith individual flow trajectory 

(c) 

from the underlying process Z^ of cluster c observed at time tij, < tij < T, 
such that Yi{tij) = Z^f'itij) where zj^^^ is the underlying random func- 

tion such that Z^'^\t) = ^i^'^\t) + XlfcLi ^ii'*vi'^'*(^); ^i^d the random measure- 
ment errors €ij are independent of ^^^^ with mean zero and variance <t^. 

To identify the structures of cluster subspaces, {f''^'^\{0^k^}k=i,...,Mc,} i 
c = 1, . . . , we follow the idea of defining clusters via subspace projection 
and apply the proposed subspace-projected functional clustering procedure 
to the training data set. In the initial step, since cluster membership is un- 
known, the clustering is based on functional principal component scores of 
an overall single random process. Details of the initial clustering refer to 
Section 2.2.1 of Chiou and Li (2007). In the iterative updating step, cluster 
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membership is determined by criterion (2.2) in a hard clustering manner. 
The clustering procedure is implemented iteratively in identifying between 
(i) cluster subspaces and (ii) cluster memberships until convergence. 

Cluster subspaces. Given the observations {{tij,Yi{tij)),i = 1, . . . ,re, j = 
1, . . . ,m}, from the historical or training data, and the cluster memberships 
of the trajectories, using the observations belonging to cluster c, the mean 
function fi^^^ can be estimated by applying the locally weighted least squares 

(c) (c) 

method while the estimates of the components (pj^ and rely on the 
covariance estimate G*-'^-* by applying the smoothing scatterplot data {Yij — 
jl^'^\tij)){Yii — fi^'^\tii)) to fit a local linear plane. Details of this estimation 
can be found in Chiou, Miiller and Wang (2003) and Yao, Miiller and Wang 
(2005a), for example. The smoothing parameters in the mean and covariance 
estimation steps are chosen data-adaptively via the 10-fold cross-validation 

(c) 

method. An estimate of Qf, can be obtained by the conditional expectation 
approach of Yao, Miiller and Wang (2005a) for the case of sparse designs. 

Here, we simply obtain the estimate = {Yi — fi^'^\ 0k{c))u by numerical 
approximation for the case of dense designs of traffic flow recording. The 
value Mc is selected as the minimum that reaches a certain level of the 
proportion of total variance explained by the Mc leading components such 
that 

(2.7) Me = argminj X: X^' / E hx^' >o} ^ ^} ' 

where 5 = 90% in this study. These cluster structure estimates are used in 
turn to estimate the vector of regression coefficients 7^ in (2.8) below. 

Cluster memberships. Given the structure of each cluster based on the 
training data, we then use the discriminative approach to fit the posterior 
probabilities of cluster membership P{C = c 1 1^) such that 

(2.8) P{C = c\Y^ = -^^^^^^^^^, c = l,...,K-l, 

and P{C = K\Yi) = l- X;f=Y ^(C = k\ Yi), taking the Kth cluster as 

the baseline. Here, the relative distance vector dj = (1, dl^\ . . . , ^^)~^ 
serves as the predictor variable and is calculated by 

IIV- - Z^'^^lP 

(2 9) (f''^ - — ^ " c - 1 K-l 

^^■^1 - uy ;7(fc)||2' C-i,...,A i, 

where ^ (s) = /i(^) (s) + J2f=i i\f («) ' with defined above. The coef- 
ficient estimates 7^ = (70c, 7ic, • • • , 1[K-i)c) a^'s obtained by the conventional 
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iterated reweighted least squares method [McCullagh and Nelder (1983)]. 
The resulting estimate (2.8) is used to determine the cluster membership 
according to (2.2). 

Now, given a newly observed trajectory Y* up to time r, denoted by 
1^^^^, we obtain the covariate vector = {'^jd*^(^}y • ■ • ^^)~^ ^ where 

(2 10) d*^"^ - ^^^^ c - 1 K-1 

Here, Z^f^^{s) = A^'^Hs) + Ejl'i C^^^)^'^^^,/'^)' and tJl^J^^. can be obtained 
by a numerical approximation to {Z^^^^ ~ /^5(t)' '^5(r) obtain 
'f'^5(r)jJ'' simply decompose the covariance estimate C^'^^ into blocks 
corresponding to the time domains S{t) and T(t) without re-estimating 
the covariance function, making the dynamical prediction step easy to im- 
plement for any given r. We predict the cluster membership for the newly 
observed by the posterior probability 

(2.11) P{C = c\Y*,.) = —f. ^ , c=l,...,K-l, 

and PiC = K\ Y*^^^) = 1 - Zc=~i' PiC = c\ y^^))- 

3. Functional mixture prediction of future traffic flow trajectories. To 

accurately predict traffic flow trajectories under various traffic conditions, 
we combine the functional prediction model with functional clustering and 
classification methods. Given a newly observed trajectory Z^^^-^ of the pro- 
cess Z as observed up to time r, we propose a functional mixture prediction 
model to predict the trajectory of Z* on the time interval T(t) = [t, T], 
denoted by Z^^^^-^ as 

K 

(3.1) ^(^f(.)(t) I z*(,)) = ^p(c=c| z*(,))z;(;\(t), 

c=l 

where Z^^^s^{t) = E{Z^^^^{t) \ Z^^^yC = c) is the predictive function con- 
ditional on cluster C = c, and P{C = c | Z^^^^) is the posterior probabil- 
ity of cluster membership given the newly observed trajectory -^^^^j up 
to time T. The functional mixture prediction model (3.1), obtained by the 
law of iterated expectation on the random cluster membership variable C, 
E{E{Z^^^^{t) I Z^^^^yC)}, minimizes the expected risk, E{C{Z^f^_^y Z^^^^)}, 
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where Z^^^^{t) = E{Z^^^^{t) \ Z^^^^C) and the loss function is defined as 

3.1. Functional linear regression of traffic flow trajectories. In a regres- 
sion setting, the process Z{s), for s G S{t) denoted by ^5(t)) serves as the 
predictor function and the process Z(t), for t G T(r) denoted by ^^(r), is 
the response function. The subspace projected functional clustering method 
developed above is well suited to identifying clusters in conjunction with 
functional prediction. For each cluster subspace, Z^"^^ is decomposed into 
■^<S(t)(^) and Z^^^^^{t) whose Karhunen-Loeve expansions can be obtained 
such that Z^;l^ (s) = /i(-) (s) + ^sl)Al),j 4t) = ^^'^ + 

ET=i^nr),j'PT\r),j(t)^ ^here the notation ., 99^")^^^ ., ^"J^^^^. and (^^j^^^ . 

are defined analogously to those on the entire domain U, but they correspond 
to the sub-domains S{t) and T{t). 

We consider a functional linear regression model [e.g., Ramsay and Dalzell 
(1991), Miiller, Chiou and Leng (2008)] conditional on cluster membership, 

E{Zrir){t)\Zs^r),C = c) 

(3.2) 

= /iW(t)+ / /3W(s,t){Z5(,)(5)-/.W(s)}ds 

Js{r) 

for all t € T{t). Here, given a fixed value of r, assume the bivariate regres- 
sion function fi'f\s,t) is smooth and square integrable, that is, 
I'Y{t) ^S{t) (^t^\^^^) dsdt < 00. Under the smoothness assumption on the un- 
derlying random process, we further assume that the bivariate regression 
function /3r^^(s,t) is a smooth function of r for all s and t. Using the eigen- 
basis expansion for the regression coefficient function such that pi'^\s,t) = 
YlT=iT.T=i(^i%^slr)j{s)ip''^^^^.^^i^{t), model (3.2) can be expressed as 



00 00 



(3.3) E{Zrir)it) I Zs^r),C = c) = ^(^)(t) + EE/'S.^M.'/'rM,. 

j=i k=i 

where ^J^t), = " f^^^K^f^^Jsir) and = ^(4t)/rt),^)/ 

E{{^^^^^ j)"^} are the regression parameters to be estimated. Under the 

smoothness assumption on Pr'^\s,t) along with r, it follows that f^l^^j is 
also smooth in r for all k and j. 

3.2. Functional linear prediction model for future traffic flow. Given ^^^^-p 
we aim to predict the values of Z^^^y Suppose that the cluster structures 

/i^*^) and j} and the regression coefficients /3^l.j are given. In practice, 
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these estimates can be obtained from the functional clustering and the func- 
tional regression analysis using the historical or training data as described 
in Section 2. Then, the functional prediction model below is used to predict 
the unobserved trajectory conditional on a specific cluster: 

oo oo 

j=i k=i 

for all t € T{t), where Cg^l^ j = i^sir) ''^5(t) j'^^i'^) ^^^^ obtained 

by numerical approximation. 

Finally, given a partially observed trajectory the unobserved trajec- 

tory -^-f (t-) can be predicted by the functional mixture prediction model (3.1) 
using the results of the functional prediction model (3.4) in conjunction with 
the multiclass logit model (2.6). However, the components in these models 
remain to be estimated. The estimation procedure for the functional linear 
model is briefly summarized below. 

3.3. Estimation for functional mixture prediction models. We note that 
the estimation of (3r,kj in (3.3) and (3.4) can further be simplified using a 
simple linear regression approach [Miiller, Chiou and Leng (2008)], such that 

for all pairs of {k,j). Therefore, functional linear regression can be decom- 
posed into a series of simple linear regressions of functional principal com- 
ponent scores of the response processes in relation to those of the predictor 
processes. 

For our predictions, given the cluster membership information and the 

subspace structure of each cluster, we estimate f3!f\s,t) in the functional 
linear regression model (3.2) based on the training data. Given the estimated 

principal component functions ip^^j^^^ ^[t) and (^^j^-^ ^(t) and the principal 

component scores i^p^^-j j and ^-^(^^ ^ , the estimate of P^^lj can be obtained 
by 

ric ^ ^ 

where and lr(r),fc ^^'^ sample averages of islr),i,j ^TM.i.fc' I'espec- 

tively. 

To take advantage of smoothness in prediction as the value r progresses, 
we further smooth the estimates {Pl^lj,T = ti, . . . ,tq} over r to obtain the 

smooth estimates (3).f^j, where Q is the number of time points at which 
predicting the future trajectory is of interest. Here, we use the local linear 
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smoothing method with cross- vahdated bandwidth [see, e.g., Fan and Gijbels 
(1996)]. Accordingly, using ^^^j and the estimates fj-^'^\t) and '^j^j^) 
obtain the predicted trajectory conditional on cluster c by 

(3.6) 

Mc Mc 

j=i k=i 

for all t G T(t). Here, Mc is determined by (2.7). Finally, combining the 
results of (3.6) with (2.11), we obtain the predicted unobserved traffic flow 
trajectory 

K 

(3.7) Zf(,)(t) = E{Zrir)it) I >^5V)) = E^rM(*)^(^ = ^ I ^5V))- 

c=l 

3.4. Implementation algorithm of functional mixture predictions. Sup- 
pose there is a newly observed trajectory {{tj,Y*{t*))]t* < r}, denoted by 
y^^^^ for short. The algorithm for functional mixture prediction that com- 
bines the functional classification procedure with the functional prediction 
model is summarized as follows. 

Step 1. Identification of cluster subspaces. Perform the subspace-projected 
functional clustering procedure according to criterion (2.2) to identify 

cluster subspaces, {^k^}k=i,...,Mc}j c = 1, . . . , K, based on the train- 

ing data set as discussed in Sections 2.1 and 2.3. 
Step 2. Model fitting based on the historical or training data. 

(i) Obtain the multiclass logit model for cluster membership distribu- 
tions. Obtain from Step 1 the regression coefficient estimates 7^ 
in (2.8). 

(ii) Fit the functional linear regression model. Fit the cluster-specific 
functional linear regression models and obtain the regression coef- 
ficient estimates ^^^{.j as a smoothed version of (3.5). 

Step 3. Prediction of the future traffic flow trajectory for a new and partially 
observed Y^, •, conditional on clusters. 

S{t) 

(i) Predict the posterior membership probability of ^(^-^ associated with 
each cluster. Calculate the relative distances d*S?\ for the given 

S(r) 

y^^^^ in (2.10) and obtain the posterior probability Pr(C = c | ^(^)) 
in (2.11). 

(ii) Predict the unobserved trajectory Y^^^-^ conditioning on each of the 
clusters. Obtain the cluster-specific functional prediction model fit 
MY^ir)\Y^iryC = c) in (3.6). 
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Step 4. Prediction of traffic flow trajectory by the functional mixture predic- 
tion model. Calculate the predicted trajectory E(Y.^^^-^ \ ^^(t-)) in (3-7) 

using the results of Pr(C = c | Yg(^^)) and E{Y^^^-^ \ Y*^^yC = c) and ob- 
tain the bootstrap prediction intervals. 

Details in constructing the bootstrap prediction intervals in Step 4 are pro- 
vided in Supplementary Material B [Chiou (2012)]. 

4. Analysis of traffic flow patterns. The sample data set of daily traffic 
flow trajectories from Section 1.1 is divided into a training data set (70 days) 
and a test data set (14 days) to examine the predictive performance of our 
model. Clusters of the traffic flow patterns from the training data are iden- 
tified based on subspace projection using the proposed subspace-projected 
functional clustering method according to criterion (2.2). The implementa- 
tion of the functional forward testing procedure of Li and Chiou (2011) leads 
to the choice of 3 clusters. Table 1 summarizes the empirical probabilities of 
rejecting the null hypotheses for K = 2,3, 4, based on 200 bootstrap samples. 
The p- values with reference to the predetermined level of significance 0.05, 
adjusted for multiple comparisons, indicate that when K = 2 and 3, the clus- 
ters are all significantly distinct, while Clusters 1 and 4 when K = A are not 
significantly different in terms of the mean functions and the eigenspaces. 

The cluster memberships displayed in Figure 3 show that Cluster 1 con- 
tains mostly weekends (left panel), with 90% being holidays including week- 
ends (right panel). Cluster 2 completely comprises weekdays including Mon- 
days through Thursdays. Cluster 3 comprises mostly weekdays, especially 
Fridays (left panel) . The mean functions of the three clusters and the overall 
trajectories are displayed in Figure 4. While Cluster 1 has a higher mean 
traffic flow rate than the other two clusters. Clusters 2 and 3 have rela- 



Table 1 

Empirical probabilities of rejecting the null hypotheses Hqi and H02 , respectively, based 

on 200 bootstrap samples 



Number of clusters 


Clusters 




H02 : S^"^ = S*'*' 


2 


1 vs. 


2 


0.000 


0.010 


3 


1 vs. 


2 


0.000 


0.005 




1 vs. 


3 


0.000 


0.005 




2 vs. 


3 


0.005 


0.015 


4 


1 vs. 


2 


0.000 


0.055 




1 vs. 


3 


0.000 


0.030 




1 vs. 


4 


0.155 


0.025 




2 vs. 


3 


0.000 


0.160 




2 vs. 


4 


0.000 


0.035 




3 vs. 


4 


0.000 


0.000 
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Cluster 1 



Holiday Nonholiday Holiday Nonholiday 

Cluster 2 Cluster 3 



Fig. 3. Frequency plots of cluster labels by days of the week (left panel) and by non- and 
holidays (right panel) for Clusters 1, 2 and 3 (left, middle and right groups). 



tively close mean flow rates in terms of shape and magnitude until 11:00, 
and they diverge thereafter with a higher mean flow rate in Cluster 3. The 
observed trajectories along with the corresponding covariance functions and 
leading eigenfunctions are shown in Figure 5. The variability of Cluster 1 is 
higher than the other two clusters, while Cluster 2 has the lowest variabil- 
ity. The peak flow rate in Cluster 1 lasts from 07:00 to 17:00 and the three 
leading principal component functions explain 77.16%, 11.14% and 5.57% 
of total variability. The trajectories in Cluster 2 have a relatively uniform 
pattern with the major peak flow rate at around 11:00. Cluster 3 indicates 
a high variability of flow rates occurring after 18:00. The mean integrated 
prediction errors are defined as nc~^ Y^^=i So ~ ^«(*))^ where 



{t) = fi^'^^{t) + X]j=i iif^l^\'t)j ^(*) is the observed trajectory and ric is 
the number of trajectories in Cluster c. These are 327.3, 78.8 and 122.4 for 
Clusters 1-3, respectively. Prediction using the overall trajectories without 




Fig. 4. Overall and cluster-specific mean functions of the training data of daily traffic 
flow rates. 
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Fig. 5. Estimated mean functions (left column) superimposed on the observed trajecto- 
ries, covariance functions (middle column) and the corresponding eigenfunctions (right 
column) of Clusters 1-3 (from top to bottom) based on the training data of daily traffic 
flow trajectories. 

clustering, in contrast, returned an error of 300.5, indicating that there is a 
huge reduction in prediction errors when heterogeneity of cluster patterns 
are taken into account. 

The model fits of the multiclass logistic regression listed in Supplementary 
Material C [Chiou (2012)] are used in predicting the unobserved trajectory 
for an up-to-date and partially observed trajectory. Given a newly observed 
trajectory from the test data up to the current time r, to consider different 
flow patterns, we predict the posterior probabilities for each of the asso- 
ciated clusters by functional classification based on the multiclass logistic 
regression model in (2.11), using the fitted regression coefficients in (2.8) 
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T(Hour) T(Hour) T(Hour) 



(a) Test sample 1 (b) Test sample 2 (c) Test sample 3 

Fig. 6. The predicted cluster membership distribution for Clusters 1-3 (indicated m blue, 
green and red) as a function of the "current time" t (per 15 min) for samples from the 
test data based on the trajectories observed up to r. 



with the relative distances (2.10) as the covariate. The posterior prob- 
abihties for some test samples are illustrated in Figure 6 with the values 
of T progressing from 08:00 to 20:00 by 15-min intervals. In Figure 6(a), 
the predicted membership probabilities of Test sample 1 degenerate to one 
for all values of r. The associated predicted trajectories are shown in Fig- 
ure 2(a)-(c). The time- varying prediction intervals, which are wider closer to 
r and taper off toward the end, depend on Cluster I's variability pattern as 
illustrated in Figure 5 (top panels). In contrast, Figure 6(b) for Test sample 
2 indicates a more complex situation where the predicted membership dis- 
tributions change with r, with the associated predicted trajectories shown 
in Figure 2(d)-(f). In this case, using the early trajectory information up to 
r = 8:00 may lead to misclassification, which makes it difficult to predict its 
future trajectory accurately. This issue is resolved as r moves onward. The 
wider prediction bands with r at 12:00 and 16:00, in comparison with that 
at 8:00, reflect the fact that the variability of Cluster 3's traffic flow trajec- 
tories is larger in the afternoon as illustrated in Figure 5 (bottom panels). 
Figure 6(c) indicates that the posterior cluster membership probabilities of 
Test sample 3 alternate between Clusters 2 and 3, owing to certain simi- 
larities in these two cluster patterns, and the predicted cluster membership 
remains with Cluster 3 after around 18:00. Given that the actual cluster 
memberships are unknown, the accuracy of functional classification for the 
up-to-date and partially observed trajectories in the test data will be inves- 
tigated via a simulation study in Section 6. 

5. Traffic flow prediction. In predicting the unobserved traffic flow tra- 
jectories, we also investigate the effects on the prediction performance of 
the interval length prior to time r and the future interval length after it. 
To this end, we define 5(t;w) = [max(0,r — ijj),t], where w is the length 
of the known interval to be used in prediction calculations and 7~(t; k) = 
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[r, min(T + k, T)] , where k is the length of the unknown interval to be pre- 
dicted from time r onward. In the test data, given a sample Y* observed 
up to time r, denoted by we define the mean integrated prediction 

error (MIPE) as the the performance measure of predicting Y*^^^y This 

is expressed as MIPE(r, w, k) = Zj=i ^"^ loi^lnr) (0 " ^^*T{r) (01^ ^t, 
where Y*j-^^.^{t) =Y*{T + t), Z*^^^^{t) is obtained by (3.7) and lUp is the 
number of trajectories in the test data. For ease of comparisons across dif- 
ferent values of r, let = max(0,T — uj) and Tg = min(T + k,T), for w > 
and K > 0. We define the total mean integrated prediction error (TMIPE) 
for the overall prediction performance by 

(5.1) TMIPE(w, n)= f " MIPE(r, w, k) dr, 

J Ts 

where Tg and Te are the smallest and the largest values, respectively, selected 
with respect to the times, r, on the domain [0,T]. In this study, T = 24 
(hours) and we set = 8 and Tg = 20. For notational convenience, we let 
K* = 24 — r and uj* = t, Tg <t <T(,, such that k* denotes the interval length 
from the current time to the end of the day and uj* denotes the maximal 
length of the past trajectory information available for prediction. 

5.1. Results and comparisons of traffic flow prediction. In this study we 
investigate the prediction performance by comparing the following methods: 

• FP: Functional prediction based on functional linear regression using the 
same setting described in Section 3.1 but without considering clusters of 
different traffic fiow patterns; 

• FMP//: Functional prediction using the proposed functional mixture pre- 
diction model except that the posterior membership probabilities (2.6) 
degenerate to zero or one such that X^^^j^ P{C = c \ ^5(^)) = 1 (where the 
subscript H reflects the so-called hard classification); 

• FMP^: Functional prediction using the proposed functional mixture pre- 
diction model (where the subscript S reflects the so-called soft classifica- 
tion or probabilistic classification). 

To examine prediction performance under various situations, we consider 
a wide range of values r along with various values of u and k as defined in 
iS(t;ci;) and T{t;k). Table 2 indicates that the proposed FMPs' is robust, 
generally outperforming the other two (FP and FMPj/) under various values 
of UJ and k. Figure 7 (left panel) indicates that small values of uj (1,2,4) 
give a similar performance and it is not surprising that the performance for 
larger values of k is worse. For fixed values of k (right panel), TMIPE as 
a function of uj generally shows a positive slope as it moves away from the 
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Table 2 

Performance comparisons for FP , FMPh andFMPs based on TMIPE (xW^) 
under various values of k and ui 





K 


1 


2 


3 


4 


5 


6 




FP 


1 


4.12 


4.92 


4.84 


4.92 


5.30 


5.68 


5.82 




4 


7.42 


7.71 


7.69 


8.10 


8.60 


9.06 


8.95 




8 


9.92 


10.26 


10.35 


10.79 


11.27 


11.71 


11.65 




K* 


12.34 


12.79 


12.94 


13.41 


13.91 


14.36 


14.30 


FMPh 


1 


3.48 


3.22 


3.20 


3.33 


3.53 


3.52 


3.14 


(3 clusters) 


4 


5.00 


4.62 


4.73 


4.91 


5.20 


5.26 


4.79 




8 


8.88 


8.44 


8.48 


8.68 


8.97 


9.07 


8.81 




K* 


12.24 


11.82 


11.87 


12.06 


12.36 


12.47 


12.33 


FMPs 


1 


3.31 


3.26 


3.07 


2.93 


2.87 


2.80 


2.81 


(3 clusters) 


4 


4.37 


4.14 


4.05 


3.98 


3.80 


3.86 


4.18 




8 


5.80 


5.64 


5.54 


5.49 


5.41 


5.61 


6.68 




AC* 


7.97 


7.94 


7.84 


7.86 


7.71 


7.89 


9.95 



origin, with a minimum at w = 5 (for k = 4, 8, or w = 6 (for k = 1) from 
Table 2. The trend is relatively flat, but becomes steeper when u = lj* . This 
discrepancy is more pronounced with increasing k. A possible explanation is 
that the flow trajectory patterns in Clusters 2 and 3 are close in shape and 
magnitude until noon and diverge thereafter and, thus, using larger uj with 
more past information may not significantly improve the overall prediction 
accuracy. In the literature, Sentiirk and Miiller (2010) considered the length 
of past data to be used for prediction and suggested the optimal length 
using a data-adaptive criterion that minimizes the absolute prediction error. 
Our empirical results also suggest the use of a data-adaptive criterion to 
choose the length of past data. Additionally, comparisons for the prediction 




-K=1 

■'K=4 

-K=7 

■'K=10 

-1C=K 



1 2 3 4 5 6 7 



Fig. 7. Performance comparisons for FMPs, based on TMIPE (5.1), displayed as a 
function of k (left) with uj fixed at 1, 2, 4 O'l^d <jJ* md as a function of lj (right) with k 
fixed at 1, 4, 7, 10 and k* . 




performance between the methods with fixed values of lo as illustrated in 
Figure 8 (with w = 1 on the left and oj = oj* on the right) reinforce the 
conclusion that FMP5 outperforms FP and FMPj^. In addition to the 3- 
cluster prediction performance illustrated above, results of the 2- and 4- 
cluster prediction performances are illustrated in Supplementary Material C 
[Chiou (2012)] for comparisons. These results also support our choice of the 
3-cluster model, which outperforms the 2- and 4-cluster models. 

5.2. Comparisons with other methods. We compare the functional mix- 
ture prediction approach with an existing method that could also be fitted 
into our functional mixture prediction framework. One possible approach is 
to treat the unobserved future trajectory for a partially observed trajectory 
as missing in the entire trajectory. That is, we may replace (3.6) in Step 3(ii) 
with the model 

(5.2) zf('=))(t) = ^(zf(,)(t) |y|(,),c = c) = A*(^)(t) + 5;ef Vf^W 

i=i 

for all t S T{t), where the estimated mean function jl*'^^\t) and the esti- 
mated eigenfunctions of the covariance kernel (p*^'^\t) of cluster c are ob- 
tained by the training data set in the clustering step with the corresponding 
domain T(t). The key step is to estimate the functional principal compo- 
nent scores, which cannot be obtained easily since the trajectory is 
only partially observed. An existing method that can deal with this situa- 
tion makes use of the expectation of the posterior distribution in Proposition 
1 of Zhou, Serban and Gebraeel (2011), assuming that the prior distribu- 
tion of the scores is Gaussian, for an application to degradation modeling. 
This formula coincides with the conditional expectation approach in equa- 
tion (4) of Yao, Miiller and Wang (2005a) under Gaussian assumptions, 
although they are different in terms of statistical inference. We term this 
method the Functional Principal Component Prediction (FPCP) approach. 
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Table 3 

Performance comparisons for FPCP , FPCPfr and FPCPs based on TMIPE (xlO'^ ) 
under various values of k and ui 





K 


1 


2 


3 


4 


5 


6 




FPCP 


1 


14.30 


13.94 


13.63 


12.62 


10.49 


8.48 


7.69 




4 


17.43 


16.70 


15.37 


13.57 


11.90 


11.11 


10.77 




8 


17.97 


16.98 


15.96 


14.93 


14.10 


13.73 


13.30 




K* 


19.27 


18.59 


17.88 


17.13 


16.50 


16.19 


15.66 


FPCPij 


1 


7.83 


8.62 


8.90 


8.90 


10.57 


10.67 


2.85 


(3 clusters) 


4 


10.72 


11.35 


11.99 


11.63 


11.21 


9.75 


4.80 




8 


12.57 


12.62 


12.28 


11.94 


12.05 


11.65 


9.19 




K* 


14.35 


14.36 


14.24 


14.13 


14.14 


13.94 


12.48 


FPCPs 


1 


6.26 


7.29 


7.54 


7.73 


9.41 


9.44 


3.07 


(3 clusters) 


4 


8.73 


9.84 


10.79 


10.86 


10.61 


9.86 


4.87 




8 


10.13 


11.16 


11.66 


11.65 


11.60 


11.28 


8.80 




AC* 


12.12 


12.95 


13.23 


13.12 


13.14 


13.14 


12.37 



We apply FPCP to the proposed functional mixture prediction algorithm 
including the cases with and without clustering/classification considerations 
for comparisons, including FPCP, FPCP// and FPCPs' that are parallel 
to FP, FMP// and FMPg. The results shown in Table 3, in comparison 
with the results in Table 2, indicate that the functional mixture prediction 
approach in conjunction with functional linear regression outperforms the 
FPCP approach. Additional results for the 2- and the 4-cluster models are 
also provided in Supplementary Material C [Chiou (2012)] for comparison. 

6. Simulation. We implement a Monte Carlo simulation to evaluate the 
performance of the functional clustering and classification procedures as well 
as the functional prediction accuracy. We simulate the scenario of the real 
traffic flow trajectories analyzed in the previous sections. We generate a 
training data set and a test data set for each simulation run using the es- 
timated results of the 3-cluster traffic flow trajectories as the true models 
with a total of 100 simulation replicates. The numbers of curves ric are 21, 
31 and 18 for Clusters 1-3 in each training data set and are 3, 8 and 3 in 
each test data as in the previous analysis. The synthetic curves of cluster c 

are generated by the model yf^(tj) = + Ef=iCi^Uf (tj) + e\f , for 

i = 1, . . . ,nc, where ^^^^ are normal random variates with a mean of zero 

and variance A'-'^-* and the random measurement errors are independent 
and follow a normal distribution with a mean of zero and variance a? ^ . The 
recording times tj = j /A for j = 1, . . . , 96 mimic the 15-min recording time 
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interval. The quantities (pj , Xj and cj^^^ use the model estimates of 
our real traffic flow data analysis. The numbers of components Mc are deter- 

(c) 

mined by the numbers of the estimated A^- that are strictly positive. Further 
details of the simulated models regarding the underlying functions fi^'^^ and 

(c) 

(fy- , along with a sample of synthetic trajectories, are displayed in Supple- 
mentary Material D [Chiou (2012)]. The clustering results of this simulated 
sample, including the estimated mean function and the eigenfunctions along 
with the covariance functions, are also illustrated. 

The average clustering error rates are 6.48% (with standard error 1.28%), 
1.68% (0.45%) and 8.39% (2.01%) for Clusters 1-3 based on the 100 sim- 
ulated training data sets. The accuracy of classification for the future tra- 
jectory to be predicted for a partially observed trajectory in the test data 
depends on the values r, the "current" time observed thus far. The aver- 
age classification error rate decreases with r, ranging from 27.5% at 8:00 
to 7.7% at 20:00, implying that prediction accuracy increases with r. Addi- 
tional details regarding accuracy of clustering and classification are compiled 
in Supplementary Material D [Chiou (2012)]. 

The prediction performances based on the proposed functional mixture 
prediction (FMP) approach are summarized in Table 4. The method FMP^ 

Table 4 

Average TMIPE (xlQ^ ) (with s.e. m parentheses) for FP, FMP^, FMPg and FMPJ 
under various values of k and lo based on 100 simulation replicates 





K 




1 




2 




4 




6 


* 


FP 


1 


2.99 


(0.03) 


3.44 


(0.05) 


3.90 


(0.07) 


4.40 


(0.08) 


4.87 


(0.11) 




4 


5.53 


(0.09) 


5.72 


(0.10) 


6.18 


(0.12) 


6.70 


(0.14) 


7.18 


(0.17) 




8 


6.77 


(0.13) 


6.92 


(0.15) 


7.33 


(0.17) 


7.80 


(0.20) 


8.26 


(0.21) 




K* 


7.50 


(0.18) 


7.67 


(0.20) 


8.08 


(0.22) 


8.52 


(0.24) 


8.97 


(0.25) 


FMPh 


1 


2.88 


(0.04) 


3.17 


(0.05) 


3.42 


(0.07) 


3.58 


(0.08) 


3.61 


(0.09) 




4 


4.98 


(0.11) 


5.14 


(0.12) 


5.44 


(0.15) 


5.65 


(0.17) 


5.77 


(0.18) 




8 


6.00 


(0.15) 


6.12 


(0.16) 


6.42 


(0.19) 


6.66 


(0.22) 


6.89 


(0.24) 




K* 


7.18 


(0.27) 


7.47 


(0.33) 


8.02 


(0.50) 


8.41 


(0.57) 


8.80 


(0.53) 


FMPs 


1 


2.80 


(0.04) 


3.09 


(0.05) 


3.30 


(0.07) 


3.49 


(0.07) 


3.49 


(0.07) 




4 


4.88 


(0.09) 


4.91 


(0.10) 


5.26 


(0.11) 


5.44 


(0.13) 


5.42 


(0.13) 




8 


5.90 


(0.14) 


5.99 


(0.14) 


6.34 


(0.16) 


6.38 


(0.17) 


6.59 


(0.18) 




* 


6.90 


(0.19) 


7.07 


(0.19) 


7.25 


(0.22) 


7.58 


(0.23) 


7.86 


(0.25) 


FMPs 


1 


2.60 


(0.03) 


2.81 


(0.04) 


3.03 


(0.05) 


3.13 


(0.06) 


3.13 


(0.06) 




4 


4.11 


(0.06) 


4.18 


(0.07) 


4.40 


(0.09) 


4.48 


(0.10) 


4.46 


(0.11) 




8 


4.57 


(0.09) 


4.59 


(0.09) 


4.74 


(0.11) 


4.78 


(0.12) 


4.73 


(0.13) 






4.58 


(0.10) 


4.57 


(0.10) 


4.65 


(0.12) 


4.68 


(0.13) 


4.62 


(0.13) 
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is the same as FMP^, apart from that FMP^ assumes the cluster /classifica- 
tion memberships are known, serving as the gold standard for prediction 
performance comparisons. The results clearly demonstrate that FMP5 out- 
performs FP and FMP//, with relatively smaller values of TMIPE and the 
associated standard errors indicating that FMP^ has better prediction ac- 
curacy and is quite robust. In FMP5, the prediction errors using a; = 1 and 
a; = 2 are close to each other and perform the best under various values of n. 
The optimal selection of lo appears to be different from those obtained from 
our real traffic flow data. Although the generated data based on the model 
estimates may reach a high level of realism to traffic flow data, they may 
not be able to capture the entire data features such as outlying curves that 
could influence the prediction performance. In addition, classification errors 
of the partially observed trajectory may also play a role in prediction. We 
also compare the FMP method with the FPCP approach. The simulation 
results are listed in Supplementary Material D [Chiou (2012)]. The results 
suggest larger values of uj for minimal prediction errors. The intuition be- 
hind these results is that FPCP treats the partial trajectory to be predicted 
as missing values, especially when the data are more homogeneous within 
clusters and contain less outlying curves as in the simulated data. Overall, 
the results demonstrate that the proposed FMP approach outperforms the 
FPCP approach. 

7. Concluding remarks and discussions. This study presents a method- 
ological framework for uncovering traffic flow patterns and predicting traffic 
flow. The proposed functional data approaches, including classification and 
prediction, identify clusters with similar traffic fiow patterns, facilitating ac- 
curate prediction of daily traffic flow. Although motivated by the subject 
of traffic flow prediction, the proposed methodology is generally applica- 
ble and transferable to the analysis and prediction of any longitudinally 
collected functional data, such as city electricity usage or degradation stud- 
ies in manufacturing systems. The empirical results demonstrate that our 
proposed method, functional mixture prediction, which combine functional 
prediction with probabilistic functional classiflcation, can work reasonably 
well to predict traffic flow. We conclude that taking traffic flow patterns into 
account can greatly improve prediction performance as long as the traffic 
flow patterns can be satisfactorily identified. 

In the literature of intelligent transportation systems, conditional expec- 
tation is commonly used as the measure of traffic flow prediction/forecast of 
a future trajectory at a future time point or short period. However, it may be 
interesting to consider probabilistic forecasts [Gneiting (2008)], which take 
the form of probability distributions over future trajectories. A probabilistic 
forecast may engender a new way of thinking about traffic flow prediction, 
which may give a better account of uncertainty in potential flow trajectories. 
In this study, our focus was on predicting a future trajectory in the form of 
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conditional expectation for an up-to-date and partially observed trajectory. 
Under the functional mixture prediction framework, a mixture of predictive 
distributions of the potential trajectories could instead serve as an ensemble 
for probabilistic forecasting. However, substantial efforts would be needed to 
accomplish the goal of probabilistic forecasting for traffic flow trajectories. 

In addition to predictive accuracy, the real-time feature of traffic flow in- 
formation is important in traffic management. Given that the components of 
our proposed model are estimated based on historical data, as in the training 
data, the proposed method also serves as a real-time prediction approach 
for predicting the future unobserved traffic flow trajectory for a partially 
observed flow trajectory. The fact that real-time information is quickly and 
easily updated will facilitate the establishment of effective reporting systems 
for traffic flow prediction. Furthermore, this article discussed single-detector 
traffic prediction, a category crucial in supporting demand forecasting as 
required in practice by operational network models. Future research might 
extend to multiple-detector traffic prediction and will be important in work- 
ing toward the goal of better road network management. 
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SUPPLEMENTARY MATERIAL 

Supplement to: "Dynamical functional prediction and classification, with 
application to traffic flow prediction." by J.-M. Chiou 

(DOI: 10.1214/12-AOAS595SUPP; .pdf). This supplement contains a PDF 
(AOAS595SUPP.pdf) which is divided into four sections. Supplement A: 
Selection of the number of clusters; Supplement B: Bootstrap prediction 
intervals; Supplement C: Additional results for traffic flow prediction; Sup- 
plement D: Additional simulation details and results. 
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